Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
…/lighteval into nathan-add-tests-for-metrics
…/lighteval into nathan-add-tests-for-metrics
There was a problem hiding this comment.
Pull Request Overview
This pull request adds an automated testing framework for LightEval metrics to ensure their reliability and correctness. The automated testing system allows developers to define test cases with input/output pairs in JSON files, which are then automatically validated against metric implementations.
- Creates an automated testing framework for metrics with JSON-based test case definitions
- Moves existing unit tests from
tests/taskstotests/unit/tasksfolder for better organization - Fixes broken metrics by correcting function signatures and implementing missing dependencies
Reviewed Changes
Copilot reviewed 57 out of 82 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/tasks/test_registry.py | Updates custom task import paths to reflect new unit test folder structure |
| tests/unit/metrics/test_metrics_automated.py | Core automated testing framework implementation with test case models and execution logic |
| tests/unit/metrics/test_automated_metrics_pytest.py | Pytest integration for the automated testing framework |
| tests/unit/metrics/pytest.ini | Pytest configuration for metric testing |
| tests/unit/metrics/test_cases/*.json | JSON test case files for various metrics (stored in Git LFS) |
| src/lighteval/metrics/metrics_sample.py | Fixes broken metric implementations including function signatures and NLTK dependencies |
| src/lighteval/metrics/metrics_corpus.py | Adds NLTK download and fixes F1 score calculation |
| src/lighteval/metrics/metrics.py | Corrects F1 score averaging parameter |
| src/lighteval/metrics/imports/summac.py | Removes deprecated tokenizer parameter |
| src/lighteval/tasks/extended/lcb/main.py | Adds missing batched_compute parameter |
| pyproject.toml | Updates test dependencies |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
clefourrier
left a comment
There was a problem hiding this comment.
Missing a doc section in Adding a new metric to explain how to create a test file. A bunch of nits.
But overall, cool new mechanism, with clearer login! GG
| @@ -1 +1,2 @@ | |||
| *.json filter=lfs diff=lfs merge=lfs -text | |||
| tests/unit/metrics/test_cases/*.json -filter -diff -merge text | |||
There was a problem hiding this comment.
do not use git-lfs for json files in the test_cases dir
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 59 out of 84 changed files in this pull request and generated 4 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| if metric_params != {}: | ||
| metric = self.METRIC_CLASSES[metric_class].value | ||
| metric_enum_value = copy.deepcopy(metric)(metric_params) | ||
| else: | ||
| metric_enum_value = self.METRIC_CLASSES[metric_class].value |
There was a problem hiding this comment.
Line 126 should access the metric enum, not the value. It should be metric = self.METRIC_CLASSES[metric_class] without .value, then call metric.value on line 127.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- Adds mechanism to auto test metric. When creating a metric you now create a json file with test cases (input, output and expected results). - move unit test to a tests/unit folder. - fix broken metrics --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
tests/unitfolder.